Attempting to Bypass Alignment from Comparable Corpora via Pivot Language

نویسندگان

  • Alexis Linard
  • Béatrice Daille
  • Emmanuel Morin
چکیده

Alignment from comparable corpora usually involves two languages, one source and one target language. Previous works on bilingual lexicon extraction from parallel corpora demonstrated that more than two languages can be useful to improve the alignments. Our works have investigated to which extent a third language could be interesting to bypass the original alignment. We have defined two original alignment approaches involving pivot languages and we have evaluated over four languages and two pivot languages in particular. The experiments have shown that in some cases the quality of the extracted lexicon has been enhanced.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cupral 2017 The First Workshop on Curation and Applications of Parallel and Comparable Corpora Proceedings of the Workshop

We propose a novel method to bootstrap the construction of parallel corpora for new pairs of structurally different languages. We do so by combining the use of a pivot language and self-training. A pivot language enables the use of existing translation models to bootstrap the alignment and a self-training procedure enables to achieve better alignment, both at the document and sentence level. We...

متن کامل

Bilingual phrase-to-phrase alignment for arbitrarily-small datasets

This paper presents a novel system for sub-sentential alignment of bilingual sentence pairs, however few, using readily-available machine-readable bilingual dictionaries. Performance is evaluated against an existing gold-standard parallel corpus where word alignments are annotated, showing results that are a considerable improvement on a comparable system and on GIZA++ performance for the same ...

متن کامل

Evaluating a Pivot-Based Approach for Bilingual Lexicon Extraction

A pivot-based approach for bilingual lexicon extraction is based on the similarity of context vectors represented by words in a pivot language like English. In this paper, in order to show validity and usability of the pivot-based approach, we evaluate the approach in company with two different methods for estimating context vectors: one estimates them from two parallel corpora based on word as...

متن کامل

Using English as Pivot to Extract Persian-Italian Parallel Sentences from Non-Parallel Corpora

Ebrahim Ansari ([email protected]) et al. 2017. Using english as pivot to extract persian-italian parallel sentences from non-parallel corpora. In " Applications of Comparable Corpora " edited book Berlin Linguistic Press (ed.). The effectiveness of a statistical machine translation system (SMT) is very dependent upon the amount of parallel corpus used in the training phase. For low-resource l...

متن کامل

Phrase-based statistical machine translation with pivot languages

Translation with pivot languages has recently gained attention as a means to circumvent the data bottleneck of statistical machine translation (SMT). This paper tries to give a mathematically sound formulation of the various approaches presented in the literature and introduces new methods for training alignment models through pivot languages. We present experimental results on Chinese-Spanish ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015